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Abstract 

We  describe  a  distribute<'  and  iterative  approach  to  perform  the  unitary  transformations  in  the  square 
root  information  filter  imple  uentation  of  the  Kalman  filter,  providing  an  alternative  to  the  common  QR 
factorization-based  approach(  s.  The  new  approach  is  useful  in  approximate  computation  of  filtered  estimates 
for  temporally-evolving  random  fields  defined  by  local  interactions  and  observations.  Using  several  examples 
motivated  by  computer  vision  applications,  we  demonstrate  that  near-optimal  estimates  can  be  computed 
for  problems  of  practical  importance  using  only  a  small  number  of  iterations,  which  can  be  performed  in  a 
finely  parallel  manner  over  the  spatial  domain  of  the  random  field. 
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1  Introduction 


We  describe  a  highly  parallel  approach  to  the  square  root  information  (SRI)  filter  (Bierman,  1977)  imple¬ 
mentation  of  the  Kalman  filter.  Our  motivation  for  developing  this  method  comes  from  the  field  of  image 
sequence  processing  and  computer  vision.  In  applications  such  as  the  estimation  of  motion  and  reconstruc¬ 
tion  of  surfaces  in  image  sequences  we  are  faced  with  the  problem  of  estimating  an  entire  spatial  field  at  each 
point  in  time.  For  example,  in  computation  of  “optical  flow”  (Horn  and  Schunck,  1981)  a  two-dimensional 
apparent  velocity  vector  is  to  be  estimated  in  each  pixel;  thus,  for  a  256  x  266  image  we  are  faced  with  updat¬ 
ing  more  than  100,000  (2  x  256  x  256)  variables  over  time.  While  image  processing  has  provided  the  original 
motivation  for  our  work,  problems  of  this  scale  potentially  arise  in  any  distributed  parameter  estimation  or 
control  application  in  which  estimates  of  spatially-distributed  processes  are  to  be  computed  and  tracked. 

For  problems  of  such  large  dimensions,  a  straightforward  implementation  of  recursive  estimation  equa¬ 
tions  such  as  the  Kalman  filter  is  prohibitively  expensive.  In  particular,  the  calculation,  propagation,  and 
storage  of  the  error  covariance  and  Kalman  gain  matrices  are  often  impossible.  Indeed  in  many  applica¬ 
tions,  including  the  estimation  of  optical  flow  and  surface  reconstruction,  the  measurement  matrix  can  be 
data-dependent,  requiring  on-line  calculation  of  the  filter  covariance  and  gain  —  an  even  more  unreasonable 
demand.  Consequently,  there  are  compelling  reasons  to  develop  alternate,  computationally  efficient,  and 
hopefully  near-optimal  approximations  to  the  Kalman  filtering  equations. 

The  key  to  our  approach  to  this  approximation  problem  is  that  the  inverse  of  the  square  root  of  a 
covariance  matrix  has  a  natural  interpretation  as  a  model  for  a  random  phenomenon.  As  an  illustration, 
consider  a  standard  discrete  state-space  model 


s(s)  =  as(s  —  1)  4- rw(s). 

(1) 

s(0)  =  So 

(2) 

where  aio  is  a  sero-mean  random  variable  and  iy(s),  1  <  s  <  n,  is  zero-mean  white  noise  independent  of  xo- 
If  we  let  X  be  the  vector  constructed  by  stacking  ®(0)  through  x(n)  and  let  w  be  the  vector  consisting  of 
So,  tu(l), . . . ,  u;(n),  then  our  model  becomes 

Mx  =  w  (3) 

where  M  is  a  lower  bidiagonal  matrix  capturing  both  the  dynamic  equations  (l)  and  initial  condition  (2). 
From  (3)  we  see  that  the  covariance  P  of  x  is  given  by 


P  =  M-iQM-^  (4) 

where  Q  is  the  diagonal  covariance  matrix  of  w.  Thus,  except  for  the  simple  scaling  implied  by  the  presence 
of  Q  on  the  right-hand  side  of  (4),  the  matrix  M  is  the  inverse  of  the  square  root  of  the  covariance  of  x. 
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Similarly,  if  we  have  a  2-D  random  field  f{si,S2)  described  by  the  2-D  difference  equation 

ao/(si,  S2)  +  ai/(si  +  1,  S2)  +  a.2fisi,S2  +  1)  +  03/(31  -  1,  S2)  +  04/(31,32  -  1)  =  '“'(si,  32),  (5) 

together  with  appropriate  (e.g.  Dirichlet)  boundary  conditions,  we  can  again  collect  the  variables  f{si,S2) 
into  a  vector  x  as  well  as  the  variables  u)(si,  32)  and  the  boundary  conditions  into  a  vector  w,  so  that  x  and 
w  are  once  again  related  as  in  (3).  The  associated  matrix  M,  while  not  being  lower  bi-diagonal,  is  extremely 
sparse  and,  of  course,  spatially  local. 

Having  such  sparse  structures  in  the  inverse  of  the  square  root  of  a  covariance  matrix  has  important 
consequences  for  the  interpretation  and  computation  of  the  SRI  filtering  algorithm.  Consider  the  Kalman 
filtering  problem  for  the  discrete  dynamic  system 

x(t)  =  A(t)x(t  —  1) -t- w(t)  (6) 

y(t)  =  C(t)x(t)  +  y(t)  (7) 

where  w(t)  and  y(t)  are  independent  Gaussian  white  noise  processes  with  covariances  Q(t)  and  R(t),  re¬ 
spectively.  Let  x(t)  be  the  filtered  estimate,  i.e.,  the  conditional  mean  x(t)  =  jE7[x(t)  |  y(T),r  <  t],  and 
x(f)  be  the  associated  estimation  error  with  covariance  P(t).  Define  the  SRI  matrix  r(t)  to  be  the  inverse 
of  a  square  root  of  P(t),  i.e.  a  square  matrix  such  that  r^(t)r(t)  =  P~^(t).  The  SRI  filtering  algorithm 
performs  recursive  propagation  of  r(t)  and  of  z{t)  =  r(t)x(t).  We  will  refer  to  (  z(t),  r(t)  )  as  the  SRI  pair, 
in  contrast  to  the  conditional  mean  and  covariance  pair  (  x(t),  P(t)  ). 

If  we  wish  to  compute  the  optimal  estimate  x(t)  given  the  SRI  pair,  we  need  to  solve 

r(t)  x{t)  =  z(t) .  (8) 

For  the  problems  motivating  this  work,  explicit  inversion  of  r(t)  and  even  the  exact  calculation  and  storage 
of  r(t)  are  prohibitively  complex.  However,  if  r(t),  or  more  precisely  an  adequate  approximation  of  r(t), 
were  sparse  and  banded,  then  (8)  could  be  solved  efficiently  using  various  methods  of  numerical  linear 
algebra,  such  as  successive  overrelaxation,  multigrid,  etc.  The  main  computational  issue  in  the  Kalman 
filtering  problem  is,  therefore,  how  to  time-recursively  compute  the  elements  of  the  matrix  T{t)  or  its  sparse 
approximation  efficiently.  A  spatially  distributed  method  to  perform  this  computation  is  presented  in  this 
paper.  Note  that,  as  we  have  observed,  r(t)  provides  us  with  what  can  be  viewed  as  either  a  whitening  filter 
or  a  model  for  the  estimation  error  x(t),  i.e.. 


T{t)  x(t)  =  6{t) 


(9) 


where  6{t)  is  zero-mean  with  identity  covariance.  Computation  of  r(t),  then,  corresponds  to  the  specification 
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of  a  spatial^  model  for  the  components  of  the  error  x(t)  for  each  i.  Thus,  specifying  a  sparse  approximation 
to  r(t)  corresponds  precisely  to  reduced- order  spatial  modeling  of  the  error  field  x(t). 

In  this  paper  we  present  an  iterative,  highly  parallelizable  algorithm  for  the  implementation  of  the  optimal 
SRI  filter.  In  addition  to  showing  that  this  alternative  to  the  previously-developed  SRI  filter  algorithms  does 
indeed  converge  in  general,  we  also  demonstrate  that  the  highly  parallel  structure  of  our  iterative  procedure 
naturally  leads  to  surprisingly  effective  and  computationally  efficient  algorithms  for  suboptimal  estimation 
in  situations  in  which  the  exact  computation  and  storage  of  r(t)  is  not  feasible.  In  particular,  we  show  a 
reduced-order  filtering  technique  that  constrains  the  SRI  matrix  r(t)  to  be  manageably  sparse  at  all  times. 
We  focus  most  of  our  attention  on  the  issue  of  how  to  propagate  this  sparsely  approximated  SRI  matrix, 
denoted  as  ra(t),  and  its  accompanying  vector  x^it)  Rs  ro(t)x(t)  efficiently  over  time. 

2  Model  Based  Approximation 

To  provide  a  more  precise  picture  of  the  type  of  approximation  we  seek,  let  us  consider  a  general  space-time 
estimation  problem  that  might  arise  in  distributed  parameter  estimation  problems  and  image  processing 
applications.  In  particular,  we  wish  to  estimate  an  unknown  random  field  f(a,t)  over  a  discrete  space 

3  G  and  time  t  ^  Z,  where  Z  is  the  set  of  integers,  based  on  the  dynamic  system  formulation  (6), (7). 

For  this  work  we  focus  on  the  case  where  the  space-time  dynamics  are  specified  by  a  set  of  local  interactions 
(e.g.  a  set  of  partial  differential  equations)  and  the  observations  are  correspondingly  local  (e.g.  point  or 
weighted-sum  observations).  The  state  vector  x(t)  is  defined  to  be  a  temporal  slice  of  the  random  field 
sampled  at  time  t  and  over  the  entire  spatird  domain  consisting  of  n  spatial  sites.  Equation  (6)  represents 
the  temporal  dynamics  of  this  spatio-temporal  field  (e.g.,  a  spatially  and  temporally  discretized  version  of  the 
partial  differential  equation  for  the  field),  and  (7)  specifies  the  local  measurements  of  the  field  at  some  or  all 
of  the  points  in  the  spatial  domain.  In  general  the  spatial  domain  is  a  if -dimensional  rectangular  grid,  and 
the  elements  x(i,  t),  1  <  i  <  n,  of  the  vector  x(t)  are  the  random  variables  /(s,t)  ordered  lexicographically 
according  to  the  spatial  coordinates  s  =  (si,  S2, . . . ,  s^).  Specifically,  let  s*  =  1, 2, . . . ,  n*  for  fc  =  1, 2, . . . ,  if; 
then,  we  let  x{i,t)  =  f{a,t)  where  the  index  i  and  grid  coordinates  s  have  one-to-one  correspondence 

K  k-l 

i  =  Si -i- -  1)  n  (10) 

k=2  j=l 

As  we  will  see,  for  the  cases  of  interest  in  this  work  (dominated  by  local  interactions  and  observations), 
this  lexicographical  ordering  of  the  spatial  sites  leads  the  key  matrices  —  including  A(t),  C(t),  and  r(t)  — 
to  adopt  predominantly  diagonally  banded  structures.  Organizing  the  matrix  elements  by  their  diagonal 
bands  allows  us  to  make  a  coherent  presentation  of  our  spatially  distributed  filtering  algorithm,  as  we  switch 

^  It  is  important  to  empliMiie  that  this  perspective  interprets  r(4)  as  a  model  among  the  components  of  x(t)  ot  each  instant 
of  time,  i.e.,  each  such  model  is  for  a  fixed  value  of  t. 
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back  and  forth  between  matrix  row  position  indexed  by  i  and  the  spatial  domain  spanned  by  s.  Note  that 
n  =  dim(x)  =  ni^=i  implying  a  large  dimension  of  the  state  vector  and  SRI  pair.  For  example,  filtering 
of  a  512  X  512  image  sequence  requires  us  to  contend  with  vectors  x(t)  and  z(t)  of  about  quarter  million 
elements  each  and  a  matrix  r(t)  of  square  that  dimension.  For  simplicity,  let  us  assume  that  is 

scalar-valued  for  now;  we  discuss  the  cases  where  f(a,t)  is  a  vector  field  in  Section  4.2.4. 

Estimates  of  the  random  field  can  be  obtained,  in  principle,  by  Kalman  filtering  or  smoothing  based  on 
(6), (7).  Compared  with  standard  Kalman  filtering  algorithms,  square  root  algorithms  are  known  for  their 
superior  numerical  properties  (Bierman,  1977,  Kaminski  et  al.,  1971),  especially  desirable  in  applications 
in  which  the  space-time  filters  are  the  hard-wired,  high-speed  “front-ends”  of  complex  control  systems  (e.g. 
Masaki,  1992).  With  the  typically  large  dimension  n  of  the  spatial  domain  (and  of  the  state  vector),  however, 
exact  recursion  of  the  SRI  pair,  costing  O(n^)  flops  for  each  t,  is  computationally  infeasible  in  practice.  To 
address  this  computationa  problem,  it  is  useful  to  realize  that  each  time-step  of  the  Kalman  filter  or  its 
SRI  implementation  can  br  viewed  explicitly  as  a  purely  spatird  processing  problem.  Specifically,  the  one- 
step-ahead  prediction  step  in  the  filter  corresponds  to  the  estimation  of  a  predicted  field  based  on  the 
estimate  at  the  current  time ,  together  with  a  computation  of  a  spatial  model  for  the  errors  in  this  predicted 
estimate,  as  captured  by  the  error  covariance  or  SRI  matrix.  Similarly,  the  update  step  involves  both  the 
spatial  processing  of  the  new  observations  to  update  the  predicted  field  together  with  the  updating  of  the 
corresponding  spatial  model  for  the  errors  in  this  updated  field  estimate. 

Note  that  the  solution  of  (8)  can  also  be  viewed  as  the  solution  of  a  spatial  processing  problem.  In 
particular,  let  yij  be  the  elements  of  the  matrix  r(<).  Then,  the  i*''  row  of  the  matrix  equation  (8)  is 

n 

Y^yijx(j,t)  =  z{i,t)  (11) 

>=i 

where  z{i,t)  and  x{i,t)  are  the  i***  elements  of  the  vectors  z{t)  and  x(t),  respectively,  and  where  the  index  t  is 
related  to  the  spatial  coordinates  s  via  (10).  Also,  from  (9)  we  see  that  the  spatial  model  for  the  estimation 
error  x(t)  satisfies  an  equation  exactly  as  in  (11)  but  with  x{j,t)  replaced  by  x{j,t)  and  z{i,t)  replaced  by 
the  unit  variance  spatial  white  noise  process  6{i,t). 

The  domain  of  the  summation  in  (11)  is  over  all  the  spatial  sites,  implying  that  the  computation  of  x(t) 
from  z(t)  is  a  demanding  task  and  that  the  white-noise-driven  spatial  model  for  the  estimation  error  x{t) 
has  an  order  equal  to  the  extent  of  the  entire  spatial  domain  of  interest.  This  insight  naturally  suggests  the 
idea  of  seeking  a  reduced-order,  spatially  local,  approximate  model  in  place  of  the  exact  r(t).  Specifically, 
in  such  a  model  the  support  of  the  summation  in  (11)  is  reduced  as 

^  Ti>S(i,i)  = 

j€/^i 

where  Ai  is  a  small  set  of  the  indices  for  spatial  sites  local  to  the  site  i 


(12) 

.  The  cardinality  of  the  set  Ai  roughly 
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determines  the  order  of  the  model,  which  in  our  applications  will  generally  be  taken  to  be  0(1)  rather  than 
0(n).  For  example,  in  a  “nearest-neighbor”  (Levy  et  ai,  1990)  approximation  on  a  2-D  spatial  domain 
{K  =  2)  as  in  (5),  for  each  i,  J\fi  is  the  index  set  corresponding  to  the  set  of  coordinates 


{(silS2)i  (si  +  L  52),  (si  —  1,S2),  (si,S2-bl),  (si,S2—  1)}  (13) 

where  (si,S2)  are  the  coordinates  of  site  i.  This  reduced-order  modeling  framework  is  clearly  related  to  the 
idea  of  specifying  an  approximate  local  Markov  random  field  model  (Wong,  1968)  for  a  given  spatial  process. 
For  this  reason  we  borrow  from  Markov  random  field  terminology  and  refer  to  the  set  A/i  as  a  neighborhood. 
Without  much  loss  of  generality,  we  consider  a  spatially  homogeneous  neighborhood  parameterised  by  a 
single  integer  v,  which  we  call  the  radius  of  the  neighborhood,  as  follows: 

Definition  1  (neighborhood)  £ei  ju  be  the  site  index  corresponding  to  the  spatial  coordinates  u,  and  Sj 
be  the  spatial  coordinates  corresponding  to  the  site  index  i.  Then,  for  a  given  non-negative  integer  u,  let  the 
neighborhood  Afi  be  the  set  of  site  indices  such  that 

M  =  {iu  :  In  -  s<|  <  t/}  (14) 

where  in— s|  denotes  the  “1-norm”  or  “Manhattan  distance”,  i.e.,  =  |u*  —  Sfc|. 

Thus,  by  specifying  v  (hence  the  set  of  neighborhoods  {A/i,  i  €  [1,  n]}),  we  can  approximate  (8)  based  on  the 
reduced-order  approximation  (12)  as 


ro(t)x„(t)  =  z„(t).  (15) 

In  another  words,  rtt(t)  is  obtained  from  r(t)  by  windowing,  or  by  setting  7,y  =  0  for  j  ^  A/i,'^i.  Note 
that  the  vector  z{t)  has  been  approximated  along  with  r(t)  because,  as  we  will  see  in  the  next  section, 
propagation  of  z{t)  is  coupled  to  that  of  r(t). 

Truncating  the  domain  of  summation  as  in  (12)  does  not  mean  ignoring  statistical  correlations  between 
process  elements  over  a  long  distance.  Rather,  we  have  constrained  the  order  of  the  model  used  to  cap¬ 
ture  these  correlations,  which  certainly  can  extend  over  the  entire  spatial  domain  even  when  the  spatial 
interactions  are  strictly  local  (Habibi,  1972). 

In  the  next  section  we  will  describe  the  iterative  SRI  filter  algorithm.  We  present  a  result  showing  that 
if  carried  to  completion  this  algorithm  does  indeed  converge  to  the  optimal  SRI  filter.  We  then  describe  in 
Section  4  a  constrained  version  of  the  algorithm  which  stops  far  short  of  “completion”  and  in  fact  typically 
involves  only  a  small  number  of  iterations.  For  problems  such  as  the  space-time  estimation  applications 
we  have  mentioned,  our  SRI  filter  algorithm  has  a  spatially  distributed  computational  structure  desirable 
for  processing  variables  supported  over  a  large  spatial  domain  and,  in  particular,  for  carrying  out  the  com¬ 
putations  necessary  for  propagation  of  the  approximate  SRI  peiir  (  z„(t),ra(t)  )  in  time.  Various  existing 
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implementations  of  the  SRI  filter  algorithm,  particularly  those  based  on  the  QR  factorization  (Golub  and 
van  Loan,  1989),  including  systolic  array  algorithms  (McWhirter,  1983,  Kung  and  Hwang,  1991),  do  not 
feature  such  fine-grain  parallelizability.  We  explicitly  address  the  estimation  of  space-time  processes  and 
illustrate  the  effectiveness  of  our  algorithm  with  several  examples.  Using  the  approximate  SRI  filter  algo¬ 
rithm,  near-optimal  filtered  estimates  are  obtained  experimentally  with  a  computational  cost  per  time  step 
that  is  0(n),  reduced  significantly  from  the  theoretical  cost  of  O(n^),  which  is  prohibitively  large  for  the 
typically  large  values  of  n  encountered  in  distributed  parameter  estimation  and  computer  vision  problems. 
Furthermore,  if  the  approximate  filter  is  implemented  in  parallel,  a  throughput  cost  0(1)  per  time  step  can 
be  achieved  for  the  propagation  of  (  Za(t),  ra(t)  ). 

3  Iterative  Square  Root  Filtering 

In  this  section  we  describe  a  general  iterative  algorithm  for  the  implementation  of  the  SRI  filter.  This 
general  edgorithm  will  be  used  in  Section  4  as  the  basis  for  development  of  an  efficient  near-optimal  filter 
for  space-time  estimation  problems.  To  start,  we  review  the  steps  of  SRI  filtering  for  the  system  (6), (7). 
In  the  SRI  context  the  process  and  measurement  noise  covariances  Q(t)  and  R(t)  are  typically  specified 
directly  in  terms  of  their  respective  inverse  square  roots,  W(t)  and  V(t)  (so  that  W^(t)W(t)  =  Q~^(t)  and 
V^(t)V(<)  =  R“^(t)).  Since  the  central  operations  in  SRI  filtering  involve  unitary  transformations,  we  will 
adopt  a  shorthand  notation  for  such  an  operation;  the  expression  — ►  ^2  denotes  that  the  matrix  #2  is 

obtained  by  applying  a  unitary  transformation  to  the  matrix  Also,  let  the  dimension  of  the  observation 
vector  y(i)  in  (7)  be  m,  which  is  usually  0(n).  The  computation  of  the  SRI  pair  at  time  t  from  the  pair 
at  time  (t  —  1)  involves  two  steps:  the  prediction  step,  in  which  r(t  —  1)  and  z(f  —  1)  are  predicted  ahead 
to  time  t,  and  the  update  step,  in  which  the  new  measurement  is  used  in  determining  r(t)  and  z(t).  The 
prediction  from  time  (t  —  1)  to  time  t  through  the  dynamic  equation  (6)  is  accomplished  by  the  following 
unitary  transformation,  which  nulls  the  lower-left  n  x  n  block  in  a  2n  x  {2n-\-  1)  matrix: 

r(t  -  1)  0  z(t  -  1) 

-W(t)A(t)  W(t)  0 

The  lower-right  n  x  (n-(- 1)  block  of  the  transformed  matrix  yields  the  predicted  SRI  pair  (  z(t),  r(<)  ) .  Here, 
+  ’s  denote  generically  non-zero  blocks  and  their  subscripts  indicate  the  block  sizes.  The  SRI  pair  ( 1(<),  r(t)  ) 
is  then  updated  by  the  observation  equation  (7)  using  another  unitary  transformation: 

m  m 

_V(t)C(t)  V(t)y(t) 

in  which  the  lower-left  mx  n  block  is  nulled.  The  upper  n  rows  of  the  transformed  matrix  yield  the  updated 
SRI  pair  (  z(f),  r(t)  ).  The  filtered  estimate  x(t)  is  then  obtained  from  this  updated  SRI-pair  as  the  solution  of 


r(t)  z(t) 

0  *mx  1 


(17) 


*nxn  *nxn  *nxl 

0  r(t)  z(i) 


(16) 
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(8).  Each  of  the  stages  (16), (17), (8)  costs  0{n^)  flops  in  general,  and  these  three  stages  are  the  computational 
bottlenecks  of  an  SRI  filter  algorithm.  See  (Bierman,  1977,  Kaminski  et  al.,  1971)  for  more  details. 

3.1  Unitary  transformation  by  QR  factorization 

The  unitary  transformations  (16)  and  (17)  are  commonly  performed  with  QR  factorizations  which  null  the 
selected  matrix  elements  sequentially,  in  essence  by  a  repeated  application  of  Givens  rotations^  (Golub  and 
van  Loan,  1989).  Let  U's  and  ft^’s  be  the  elements  of  two  full  rows,  respectively,  of  the  left  hand  side  matrix 
in  (16)  or  (17).  Givens  rotation  9  operates  on  two  such  rows  so  that  a  specific  element  (e.g.,  bo)  is  nulled; 


cos  9  —  sin  d 

■  ■  ■  t-i  to  ti  ■  ■  ■ 

... 

sin  9  cos  9 

1 

vH 

O 

fH 

1 

sO 

_ 1 

1 - 

o- 
1  " 

o 

1 _ 

where  9  is  evaluated  based  on  to  and  ho’,  for  convenience  we  say  “the  rotation  9  is  defined  by  {to,  6o}-” 

Since  the  purpose  of  the  two  unitary  transformation  steps  is  to  null  out  the  lower  left  block,  every  Givens 
rotation  involved  in  these  steps  is  “defined”  by  a  pair  of  elements  in  the  left-most  column  of  the  matrix 
blocks.  Let  the  left-most  column  of  blocks  in  (16)  and  (17)  be  denoted  as 

(18) 

i.e.,  the  matrix  D  plays  the  roles  of  r(<  —  1)  and  r(t),  while  E  represents  the  matrices  — W(t)A(t)  and 
V(t)C(t),  in  each  of  the  respective  steps.  The  unitary  transformation  that  nulls  the  block  E  in  each  case 
accomplishes  the  propagation  of  the  SRI  pair  as  specified  in  (16)  and  (17). 

In  a  QR  factorization-based  implementation  of  the  unitary  transformation  steps,  the  elements  are  nulled 
sequentially.  Below,  we  display  the  matrix  (18)  when  n  =  4  (for  convenience  we  let  the  block  E  to  be  square). 
In  essence,  the  QR  factorization  nulls  the  elements  marked  by  the  numbers  1  through  22  in  numeric  order, 

^  A  typical  implementation  of  the  QR  factorization  involves  a  serial  application  of  Householder  reflections  which  themselves 
can  be  considered  as  series  of  Givens  rotations. 


D 

E 
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yielding  an  upper  triangular  matrix  as  the  output: 


"1 

* 

* 

* 

* 

7 

d2 

* 

* 

♦ 

♦ 

6 

13 

<^3 

* 

♦ 

* 

5 

12 

18 

_ ^ 

* 

4 

11 

17 

22 

3 

10 

16 

21 

2 

9 

15 

20 

1 

8 

14 

19 

(19) 


The  elements  along  the  main  diagonal  of  block  D,  marked  by  did^d^d^,  are  always  involved  in  defining 
the  Givens  rotations.  The  '^R  factorization  is  accomplished  by  a  sequence  of  Givens  rotations  defined  by 
{di,  1}  ,  {di,  2}  , . . . ,  {di,  7}  ,  {(^2,  8}  )  {d2,  9}  , . . . ,  etc. 

The  QR  factorization  con  nletes  each  of  the  unitary  transformations  in  a  finite  number  of  Givens  rotations 
and  allows  pipelined  computation  on  systolic  arrays  (McWhirter,  1983,  Kung  and  Hwang,  1991).  In  terms 
of  space-time  estimation,  however,  such  an  approach  is  computationally  unattractive  and  often  infeasible, 
because  of  the  structure  of  the  computations  required  in  standard  QR  factorization  algorithms.  In  particular, 
the  strictly  ordered  nulling  procedure  inherent  in  QR  methods,  when  applied  to  space-time  processes,  yields  a 
computational  structure  that  is  spatially  sequential,  implying  that  the  variables  at  certain  spatial  sites  cannot 
be  processed  until  processing  at  every  other  site  is  completed.  Such  an  approach  has  obvious  disadvantages  for 
problems  defined  on  spatial  grids  of  even  moderate  size.  Moreover,  in  the  QR  factorization-based  algorithms 
the  SRI  matrices  are  triangular  matrices  so  that  the  inversion  in  (8)  is  usually  accomplished  by  back- 
substitution  (Golub  and  van  Loan,  1989),  another  spatially  sequential  procedure.  For  space-time  problems, 
it  is  usually  preferable  to  seek  algorithms  with  spatially  local  and  highly  parallelizable  structure.  For  example, 
if  the  matrix  r(t)  is  tridiagond  (or  block  tridiagonal  with  tridiagonal  blocks,  as  it  is  for  nearest-neighbor 
models  over  2-D  spatial  domains),  (8)  can  often  be  efficiently  solved  using  an  iterative  method,  such  as 
successive  over-relaxation  (SOR)  (Golub  and  van  Loan,  1989)  or  multigrid  (Terzopoulos,  1986)  methods, 
which  are  highly  parallelizable  and  spatially  local  with  comparatively  modest  memory  requirements.  In 
general  such  methods  are  effective  and  computationally  efficient  for  inversion  of  a  sparse  set  of  equations. 
This  observation  motivates  the  objective  of  Section  4  of  obtaining  a  sparse  approximation  to  (16), (17), (8) 
with  a  resulting  sparse  approximation  to  the  SRI  matrix,  namely  Ta(t).  Of  course,  for  this  objective  to 
make  sense,  we  must  also  use  a  spatially  local  and  highly  parallel  method  to  calculate  our  approximate  SRI 
matrix,  which  we  develop  below. 
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3.2  An  iterative  unitary  transformation 

As  we  have  indicated,  a  standard  approach  to  the  computations  in  (16), (17)  uses  a  sequential  application  of 
Givens  rotations  to  null  out  the  desired  elements  one  at  a  time.  The  basic  idea  behind  our  distributed  and 
parallel  algorithm  is  to  apply  a  number  of  these  rotations  simulianeously.  As  we  will  see,  an  element  that  has 
been  nulled  at  one  step  in  this  approach  may  become  non-zero  subsequently,  in  contrast  to  the  standard  QR 
algorithm.  However,  our  algorithm  has  the  property  that  the  repeated  iterative  application  of  this  procedure 
does  in  fact  asymptotically  null  the  desired  block.  To  illustrate  the  basic  idea,  consider  an  alternative  way 
to  null  the  submatrix  E  in  matrix  (18)  —  use  the  main  diagonal  of  the  D  to  null  every  diagonal  band  in  E 
in  sequence.  The  elements  in  a  diagonal  band  in  E  can  be  nulled  simultaneously,  as  the  Givens  rotations 
are  applied  to  disjoint  pairs  of  rows  of  matrix  (18).  This  method  is  iterative:  every  diagonal  band  in  E  is 
repeatedly  nulled,  because  in  general  nulling  of  a  band  transforms  a  previously  nulled  band  elsewhere  back  to 
a  non-zero  band  (whose  elements  are  usually  smaller  in  magnitudes  than  before).  For  the  n  =  4  case  treated 
in  (19), 


the  main  diagonal  marked  by  d's  in  the  upper  block  D  nulls  the  diagonal  bands  numbered  by  1  through  7 
in  the  lower  block  E  in  sequence.  In  general  the  ordering  of  the  bands  to  be  nulled  can  be  arbitrary.  Let 
us  define  a  sweep  to  be  a  single  round  of  nullings  in  which  every  element  in  E  is  nulled  exactly  once,  e.g., 
the  seven  band- nullings  in  (20).  As  elaborated  below,  the  entire  submatrix  E  can  be  nulled  by  repeating  the 
sweep. 

To  describe  our  iterative  unitary  transformation  algorithm  more  formally,  let  D  be  an  arbitrary  n  x  n 
matrix  whose  elements  are  denoted  as  dij,  and  let  E  be  a  p  x  n  matrix  whose  elements  are  eij.  We  again 
consider  the  generic  unitary  transformation  problem  of  nulling  the  lower  submatrix  E  in  the  matrix  (18)  by 
application  of  a  series  of  Givens  rotations. 

Definition  2  (sweep)  Consider,  for  each  of  the  pn  elements  eij  of  the  submatrix  E,  a  Givens  rotation 
defined  by  {djj,eij}  and  applied  to  the  the  j*'’  row  o/D  and  i***  row  o/E  to  null  the  element  Cij.  For  matrix 
(18),  let  a  sweep  be  a  serial  application  of  pn  such  Givens  rotations  to  the  matrix. 
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In  a  sweep,  the  elements  in  E  can  be  nulled  in  any  order  as  long  as  each  element  is  nulled  once.  Also, 
note  that  the  diagonal  elements  djj  are  the  only  elements  of  submatrbe  D  that  participate  to  define  the 
Givens  rotations.  While  we  have  indicated  that  a  sweep  involves  sequential  Givens  rotations,  highly  parallel 
implementations  are  possible  by  exploiting  the  fact  that  the  Givens  rotations  can  be  applied  to  disjoint  sets 
of  rows  concurrently,  since  the  actions  of  these  rotations  do  not  interfere  with  each  other.  Specifically,  the  set 
of  elements  {e<j  :  i  —  j  =  constant}  forms  a  diagonal  band  of  the  submatrix  E,  and  the  elements  in  such  a 
band  can  be  nulled  simultaneously  as  exemplified  in  (20).  This  band-wise  implementation  of  sweep,  referred 
to  as  a  band-sweep,  plays  the  key  role  in  the  SRI  filtering  algorithms  presented  in  this  paper. 

Definition  3  (band-sweep)  A  band-sweep  is  a  special  case  of  a  sweep,  such  that  the  elements  eij  in  a 
diagonal  band  {eij  ;  i  —  j  =  constant}  of  submatrix  E  are  nulled  concurrently. 

One  price  we  pay  for  this  parallelism  is  that  elements  of  E  that  are  nulled  at  one  point  in  a  sweep  may 
be  made  nonrero  later  in  the  sweep.  That  is,  after  the  element  Cij  is  nulled,  a  subsequent  Givens  rotation 
applied  on  the  row  of  E  can  turn  Cij  non-sero.  The  following  result  (proved  in  Appendix),  however, 
assures  that  asymptotically  the  entire  submatrix  E  is  nulled. 

Theorem  1  An  iterative  application  of  sweeps  to  the  matrix  (18)  nulls  the  block  E  in  the  limit. 

In  particular,  iterations  of  the  parallelizable  band-sweeps  are  guaranteed  to  converge  and  are  applicable  to 
optimal  recursion  of  the  SRI  pair  in  a  generic  SRI  filtering  algorithm. 

Algorithm  1  (Parallel  Recursion  of  the  SRI  Pair)  In  the  unitary  transformation  steps  (16)  and  (17), 
use  iterations  of  band-sweep  to  null  the  respective  lower-left  submatrices. 

Note  that  band-sweeps  can  achieve  a  still  higher  level  of  concurrency  by  defining  the  diagonal  band  cyclically. 
For  example,  in  (20)  all  the  lower  block  elements  labeled  2  and  7  can  be  considered  to  be  a  single  cyclic 
diagonal  band  which  can  be  nulled  simultaneously,  as  are  those  labeled  3  and  6  as  well  as  4  and  5.  Such  a 
cyclical  computational  structure  might  be  useful  for  a  space-time  estimation  problem  with  a  torroidal  spatial 
domain. 

In  general,  the  number  of  iterations  required  for  convergence  of  the  unitary  transformation  algorithm 
depends  on  the  specific  values  and  structures  of  the  constituent  matrix  blocks  in  (16)  and  (17).  In  SRI  filters 
arising  in  space-time  estimation  problems  and  the  approximate  filters  based  on  them,  the  structure  inherent 
in  the  matrices  of  such  problems  allows  the  development  of  nulling  strategies  that  exploit  both  the  natural 
and  imposed  sparseness  and  bandedness  of  the  matrix  blocks.  The  end  result  is  an  extremely  efficient  unitary 
transformation  procedure  which  requires  only  3  or  4  sweeps  for  an  adequate  accuracy  as  demonstrated  later 
with  numerical  examples.  For  the  rest  of  the  paper  we  concentrate  our  discussion  on  such  space-time  filtering 
problems. 
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4  Space-Time  Estimation 


The  iterative  unitary  transformation  method  is  especially  suitable  for  the  reduced-order  approximation  of 
the  space-time  estimation  problems  described  in  Section  2,  as  the  distributed  nature  of  the  algorithm  can 
take  advantage  of  the  sparsely  banded  structure  of  these  estimation  problems.  The  dynamics  of  such  space- 
time  random  fields  are  typically  specified  by  a  set  of  local  interactions  among  the  components  of  the  field, 
usually  expressed  in  terms  of  a  set  of  stochastic  partial  difference  equations.  The  spatial  locality  of  these 
interactions  and  the  corresponding  observations  is  then  reflected  in  sparsely  banded  structures  of  the  matrices 
A(t),  W(t),  C(t),  and  V(t)  in  the  system  equations  (6), (7).  See  Section  4.2  for  illustrations  of  problems 
with  this  type  of  structure. 

Let  us  consider  the  SRI  filter  (16), (17), (8)  in  the  context  of  such  a  space-time  estimation  problem.  As 
discussed  previously,  the  computational  cost  of  O(n^)  associated  with  exact  implementation  of  each  of  these 
steps  is  impossibly  large  for  typical  values  of  n.  We  thus  seek  an  approximate  filtering  algorithm.  First,  the 
SRI  matrix  r(t)  is  sparsely  approximated  as  ra(i)  based  on  the  reduced-order  model  approximation  (12). 
As  discussed  previously,  this  approximation  makes  (8),  or  more  precisely  (15),  a  manageably  sparse  equation 
that  can  be  solved  efficiently  by  iterative  inversion  methods  whose  throughput  cost  can  be  as  low  as  0(1)  for 
the  typically  large  values  of  n  encountered  in  practice  (Terzopoulos,  1986).  For  spatial  estimation  problems 
of  practical  interest,  however,  we  cannot  calculate  or  store  the  matrices  r(t)  or  r(t)  and  thus  cannot  directly 
generate  the  approximations  to  these  matrices  by  simply  windowing  them.  What  we  desire,  then,  is  an 
algorithm  that  directly  and  efficiently  propagates  ra(t)  itself  in  time. 

4.1  A  reduced-order  filtering  algorithm 

Suppose  that  at  some  point  in  time  we  do  have  a  sparse  approximation  to  r(t)  or  T{t).  In  this  case,  notice 
that  all  the  matrix  blocks  involved  in  the  left  hand  sides  of  (16)  or  (17)  also  have  sparsely  banded  structures. 
This  insight  leads  to  the  idea  of  incorporating  another  level  of  approximation  into  the  SRI  filter  algorithm, 
beyond  that  associated  with  (12).  In  particular,  the  sparseness  and  special  structure  of  the  matrix  blocks  in 
(16), (17)  are  exploited  to  perform  the  associated  iterative  unitary  transformations  in  an  approximate  and 
highly  efficient  manner,  producing  sparsely  banded  approximations  to  r(f)  and  r(t)  directly  and  recursively  in 
time.  The  basic  idea  behind  this  second  level  of  approximation  is  to  use  the  band-sweep  algorithm  described 
in  association  with  Algorithm  1,  but  only  to  make  partial  sweeps  which  are  matched  to  and  consistent  with 
the  desired  banded  structure  of  the  matrices  that  are  to  be  maintained  in  the  approximate  filter.  Specifically, 
since  the  matrix  blocks  at  each  stage  of  the  algorithm  are  sparsely  banded  to  begin  with,  we  may  be  able  to 
efficiently  constrain  each  of  these  blocks  to  have  a  certain  sparsely  banded  structure  throughout  the  duration 
of  the  iterations  and  thus  decrease  overall  throughput  cost.  That  is,  we  limit  the  extent  of  each  band-sweep 
to  a  finite  and  typically  small  number  of  bands,  thus  reducing  dramatically  the  number  of  elements  which  are 
to  be  nulled  in  each  cycle.  In  the  space-time  problems  we  have  examined  and  will  illustrate  in  Section  4.2, 
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the  computational  cost  of  each  iteration  of  the  approximated  band-sweep  is  usually  0(1)  per  spatial  site  as 
a  result. 

Our  approximation  of  the  full  SRI  filtering  algorithm  is  characterized  by  two  types  of  neighborhood  sets, 
which  are  used  to  constrain  the  structures  of  both  the  matrix  blocks  and  the  algorithm  itself.  The  first  type 
of  neighborhood  corresponds  to  the  spatial  model  we  wish  to  use  to  describe  the  statistics  of  the  error  field; 
it  corresponds  to  the  set  of  neighborhoods  A/<  which  specify  the  desired  order  of  the  approximate  error  field 
models  in  (12).  Therefore,  these  neighborhoods  serve  to  focus  modeling  resources.  These  neighborhoods  in 
turn  imply  that  only  certain  elements  of  ra(t)  are  allowed  to  be  nonzero,  resulting  in  a  sparsely  banded 
approximation  to  the  SRI  matrix.  Thus,  we  can  identify  the  neighborhood  set  {J^i,i£  [1,h]}  as  a  con¬ 
straint  on  matrix  structure.  The  second  type  of  neighborhoods  used  by  our  approximate  filtering  algorithm 
are  strongly  linked  to  the  first  type  A/i  but  are  basically  algorithmic  in  nature.  Specifically,  consider  the 
neighborhoods  Mi  specifiei'.  by  the  radius  n  (cf.  Definition  1)  as 

Mi  =  {ju  ■  \u- Si\  <  fi}  .  (21) 

They  correspond  to  the  redu  :ed  subset  of  bands  of  the  band-sweep  algorithm  which  will  be  nulled  at  each 
stage  of  the  approximate  algorithm,  and  thus  refiects  a  focusing  of  computational  resources.  This  viewpoint 
provides  us  with  a  rational  way  of  understanding  how  our  computational  and  modeling  resources  are  linked. 
Naturally,  the  computational  resource  must  be  at  least  as  large  as  the  support  of  the  desired  model.  That 
is,  we  must  have  Mi  D  A/i,  oi  n  >  u,  for  time-recursion  of  the  reduced-order  model.  Experiments  showing 
the  effects  of  varying  choices  of  fj,  and  v  will  be  presented  in  Section  4.2. 

Definition  4  (partial  band-sweep)  A  partial  band-sweep  is  an  approximate  band-sweep  in  which  all 
the  participating  matrix  blocks  are  windowed  by  the  neighborhood  set  {Adj}.  That  is,  in  each  submatrix,  the 
{i,jy^  elements  for  j  ^  Mi  are  treated  as  zeroes  throughout  the  band-sweep  iterations. 

In  the  spirit  of  our  generic  notation  “#i  — >  $2”  for  the  exact  unitary  transformation,  we  denote  an 
approximate  unitary  transformation  performed  with  this  partial  band-sweep  operation  by  the  expression 
#1  #2-  III  general  a  matrix  block  windowed  by  the  neighborhood  set  {Mf}  has  only  0{p.^)  diagonal 

bands.  In  a  partial  band-sweep,  only  the  elements  in  these  diagonal  bands  ever  participate  in  computation 
and  need  to  be  stored,  the  rest  being  treated  as  being  identically  zero.  Thus,  for  a  small  p,,  performing  such 
a  partial  band-sweep  leads  to  a  high  throughput  rate  when  implemented  in  parallel.  With  this  motivation 
we  have  the  following  algorithm: 

Algorithm  2  (Reduced-order  Space-time  SRI  Filter)  Let  (  z^(to),  Ta{to)  )  be  a  given  initial  reduced- 
order  approximated  SRI  pair.  Specify  the  radii  p,  and  i^,  such  that  p  >  u,  to  determine  the  neighborhood 
sets  {Mi}  and  {A/i},  respectively,  hence  defining  the  extent  of  the  partial  sweep  and  subsequent  windowing 
operation.  Also,  specify  the  number  of  sweeps  to  be  performed  for  each  unitary  transformation  step.  Repeat 
the  following  steps  for  t  =  to,  {to  +  1),  (to  +  2), . . .  : 
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1.  Prediction  Step. 

Iterations  of  partial  band-sweeps  are  applied  to  approximate  the  unitary  transformation  of  the  exact 
prediction  step  (16) 


ra(<-i)  0  ^(i-1)  ^ 

-W(0A(0  W(i)  0 


*^nxn  ^nxn  ’’'nxl 


r6(<)  %,{t) 


The  lower  left  block  fnxn  o™  right  hand  side  denotes  a  generic  small,  but  non-zero,  matrix  in  this 
position  resulting  from  use  of  the  approximate  band-sweep-based  unitary  transformation. 

2.  Prediction  Windowing  Step. 

The  matrix  ri,(t)  on  the  right  hand  side  of  (22)  is  now  further  windowed  by  {M}  to  obtain  Ta(t)- 
This  windowing  assures  that  Ta{t)  will  have  the  same  band  structure  as  Tait  —  1).  The  approximate 
predicted  SRI  pair  is  then  given  by  ( l3(t),  ra(<)  ). 

3.  Update  Step. 

Iterations  of  partial  band-sweeps  are  again  applied  to  approximate  the  unitary  transformation  of  the 
exact  update  step  (17) 


ra(i)  ^{t) 

V{t)C{t)  V(t)y(t) 


^bit)  z^(i) 


^mxn  *Tnxl 


Again,  the  lower  left  block  e^xn  on  the  right  hand  side  denotes  a  generic  small,  but  non- zero,  matrix 
in  this  position  again  resulting  from  use  of  the  approximate  band-sweep-based  unitary  transformation. 

4.  Update  Windowing  Step. 

The  matrix  Tb{t)  on  the  right  hand  side  of  (23)  is  now  further  windowed  by  {A/i}  to  obtain  ra(<). 
Again,  this  windowing  assures  that  ra(t)  will  have  the  same  band  structure  as  ra(t),  thus  maintaining 
this  strnicture  throughout  the  calculations.  The  resulting,  approximate  updated  SRI  pair  is  then  given 

h  ( ^(<),ra(<) ). 

5.  Inversion  Step. 

If  needed,  the  updated  estimate  Xa(<)  may  be  obtained  by  solving  (by  an  efficient  iterative  method  such 
as  multigrid  and  SOR)  the  sparse  and  spatially  local  set  of  equations  T a{t)2,„,{t)  =  Za(^)- 

In  the  above  algorithm,  a  fixed  number  of  iterations  (of  partial  band-sweeps)  is  used  in  each  of  the  approxi¬ 
mate  unitary  transformation  steps.  Alternatively,  the  iterations  can  be  allowed  to  continue  to  convergence 
within  a  given  numerical  tolerance  level.  In  all  of  the  space-time  estimation  problems  we  have  examined, 
however,  only  a  very  small  number  (i.e.,  less  than  5)  of  band-sweeps  per  unitary  transformation  step  are 
necessary  for  reasonably  accurate  approximations.  Such  estimation  problems  are  discussed  below. 
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4.2  Numerical  Results 


The  main  purpose  of  the  numerical  examples  presented  here  is  to  examine  accuracy  of  various  approximations, 
rather  than  to  record  the  real-time  computational  speeds.  All  computations  are  performed  serially  on  general- 
purpose  workstations  by  setting  the  spatial  dimension  n  manageably  small. 

4.2.1  Partial  band-sweep 

We  first  examine  the  impact  of  different  choices  of  the  neighborhoods  on  partial  band-sweep  in  a  general 
unitary  transformation  problem.  The  approximating  parameters  in  the  partial  band-sweep  algorithm  are  the 
size  of  the  neighborhood  Alj ,  which  is  given  by  the  integer  n,  and  the  number  of  iterations  of  the  sweeps. 
Consider  a  space-time  SRI  filtering  problem  defined  over  a  1-D  spatial  domain  (i.e.,  K  =  1)  and  a  nearest- 
neighbor  reduced-order  modeling  approximation  (i.e.,  jVj  is  specified  by  =  1).  Note  that,  under  the  1-D 
nearest-neighbor  approximation,  all  the  matrix  blocks  in  the  unitary  transformation  steps  (16), (17)  of  the 
filter  are  windowed  to  be  tridiagonal.  A  key  unitary  transformation  problem  is  to  null  the  lower  block  E  in 
the  matrix  (18)  as 


D 

D 

- 

E 

0 

and  to  compute  the  tridiagonally  approximated  (windowed)  matrix  Do  of  the  resulting  n  x  n  upper  submatrix 
D.  The  computation  of  D  itself  is  approximated  by  partial  band-sweeps  defined  by  the  neighborhood  Mi 
of  various  sizes  which  are  in  turn  specified  by  the  parameter  1  <  <  n  —  1 . 

We  conduct  a  numerical  experiment  in  which  the  blocks  D  and  E  are  randomly  generated  as  25  x  25 
tridiagonal  matrices.  Let  D(f,  /x)  denote  the  block  D  after  the  iteration  of  the  partial  band-sweep  specified 
by  the  parameter  /x,  and  let  Do(f,  m)  be  the  tridiagonal  matrix  formed  from  the  tridiagonal  part  of  the  matrix 
D(f, /x).  Since  ^  =  24  corresponds  to  the  full  band-sweep,  Da(oo,  24)  is  the  matrix  we  seek  to  approximate 
by  the  partial  sweeps,  i.e..  Do  =  Do(oo, 24).  For  each  /x  =  1,2, ...,24  we  have  computed  the  normalized 
error 

||Do(f,/x)-Do(cx:,  24)11 
l|Do(c«,  24)11  ’ 

after  evaluating  the  matrix  Do(oo,  24)  to  convergence. 

This  numerical  experiment  has  been  repeated  with  10  distinct  pairs  of  D  and  E.  For  each  of  the  partial 
band-sweeps  /x  =  1,2,  3,  5,  7  and  24,  the  worst  case  error,  i.e.,  the  maximum  values  of  the  error  (24)  in  the  10 
tries,  is  plotted  against  the  number  of  iterations  I  in  Fig.  1.  The  figure  indicates  the  speed  of  convergence,  as 
the  error  curves  level  off  in  roughly  4  iterations.  For  the  full  band-sweep  (/x  =  24)  the  maximum  normalized 
error  after  8  iterations  is  only  0.0001.  The  errors  for  the  partial  band-sweeps  decrease  with  increasing  /x  for  a 
small  value  of  /x  and  then  saturate  for  n>7.  That  is,  the  gain  in  accuracy  diminishes  quickly  as  fi  increases. 
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For  the  partial  band-sweep  /j,  —  7,  the  majcimum  error  after  8  iterations  is  only  0.0048.  These  results 
demonstrate  that  a  partial  band-sweep  characterized  by  small  neighborhoods  can  approximate  the  exact 
band-sweep  accurately  (for  the  purpose  of  computing  windowed  transformed  matrix  blocks)  and  converge  in 
a  very  small  number  of  iterations. 


4.2.2  Estimation  over  1-D  space 

We  now  apply  the  partial  band-sweeps  examined  above  to  a  space-time  estimation  problem.  Consider 
estimation  of  a  temporally  evolving  random  field  /(s,  t)  over  a  1-D  space,  whose  dynamics  are  governed  by 
a  discrete  heat  equation  (Myint-U,  1980)  driven  by  white  noise  ti;(s,  i)  with  variance  q 


f{3,t)  -  f{3,t-  1)  = 

a(/(s  -i-  1,  t)  -  2/(s,  t)  +  /(s  -  1,  t)]  +  t£;(s,  t), 


based  on  a  noisy  measurement  g(a,  t)  =  /(a,  i)  -f  v{s,  t)  where  v{s,  t)  is  a  white  nosie  with  variance  r.  This 
estimation  problem  can  be  formulated  in  the  state-space  format  (6), (7)  using  the  following  sparsely  banded 
matrices  and  observation  vector 


A(t)  = 

(1  —  o)  a 

a  (1  —  2a)  a 

.  y(<)  = 

9(1. ^) 
ff(2,t) 

a  (1  —  2a)  a 

a  (1  —  a) 

.  9(ra,t)  _ 

C(t)  =  I, 


W(t)  =  ^I,  V(t)  =  4=1- 


A  random  field  and  its  noisy  observations  are  created  using  the  parameters  n  =  25,  o  =  0.4,  q  =  0.1  and  r  =  1. 
The  filtered  estimates  x(t)  are  computed  with  the  optimal  Kalman  filter,  and  they  are  compared  with  the 
approximate  estimates  x„(t)  computed  by  Algorithm  2  using  the  same  approximation  parameters  u  and  fi’s 
as  those  examined  in  Section  4.2,1.  In  particular,  the  approximate  SRI  matrix  ra(t)  is  tridiagonal  because 
u  =  1  specifies  the  nearest-neighbor  reduced-order  model.  Various  approximate  estimates  corresponding 
to  the  partial  band-sweeps  fj,  =  1,2, 3,  5,  and  7  are  computed  using  only  3  iterations  for  each  unitary 
transformation  in  (16), (17).  Averaged  over  25  repeated  experiments,  the  means  of  the  relative  temporal 
approximation  error 


IIEiCO  -  2(*) 

l|x(f)|| 


(25) 
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are  computed  over  the  first  10  time  frames  for  each  /i  and  plotted  in  Fig.  2.  For  fi  =  7,  the  average 
approximation  error  is  less  than  1%,  exemplifying  the  accuracy  of  the  proposed  approximate  SRI  filter 
algorithm  with  respect  to  the  optimal  Kalman  filter.  Similar  sets  of  experiments  conducted  with  larger 
i^’s  have  not  decreased  the  error  significantly,  suggesting  that  the  nearest-neighbor  approximation  is  quite 
adequate  for  this  particular  estimation  problem. 

4.2.3  Estimation  over  2-D  space 

When  the  spatial  domain  is  multidimensional  {K  >  1),  the  matrices  A(t),  C(t),  W(t),  V(t),  and  Ta{t) 
have  slightly  more  complex  structures  than  simple  banded  matrices.  Nevertheless,  they  are  still  sparsely 
banded  matrices  for  the  cJises  of  interest  here,  so  that  Algorithm  2  can  offer  a  high  level  of  parallelism  in 
computation.  For  example,  as  we  have  just  seen,  the  nearest-neighbor  approximation  i/  =  1  in  the  1-D  space 
case  means  just  to  window  the  SRI  matrix  r(t)  to  form  the  tridiagonal  approximation  ra(t).  When  the 
spatial  domain  is  2-D,  t/  =  1  leads  to  ra(t)  which  has  a  nested  block  tridiagonal  structure,  i.e.,  a  block 
tridiagonal  structure  in  whic  h  the  central  blocks  themselves  are  tridiagonal  while  the  first  off-diagonal  blocks 
are  diagonal.  Thus,  ra(<)  h<s  a  total  of  5  diagonal  bands.  In  general,  for  a  2-D  space  estimation  problem, 
the  reduced-order  modeling  c.pproximation  specified  by  a  given  v  yields  an  approximated  SRI  matrix  with 
2v{v-\-\)  +  l  diagonal  bands.  We  illustrate,  with  a  numerical  example,  efficiency  and  accuracy  of  the  proposed 
algorithm  for  filtering  over  such  a  spatial  domain. 

Consider  space-time  interpolation  of  f{si,S2,t)  based  on  the  smoothness  models 

f(si,S2,t)  -  f{si,S2,t- 1)  =  iu(ai,S2,t)  (26) 

/(ai,S2,t)  - /(si  -  l,S2,t)  =  5i(si,S2,i)  (27) 

f{si,32,t)  -  f{si,S2  -  l,t)  =  62(31, S2,t)  (28) 

and  noisy  measurements  ff(si,  S2,  t)  =  /(ai,  S2i  i)  +  i^(«ii  ^2,  t),  where  w(3i,  32,t),  v{si,  32,  t),  6i{si,  32,  t),  and 
52(si,  32,  t)  are  white  noise  processes  with  variances  of  q,  r,  1  and  1,  respectively.  The  models  (26), (27), (28) 
impose  both  temporal  and  spatial  smoothness  on  the  reconstructed  field,  respectively.  The  problem  can  be 
expressed  as  an  estimation  problem  based  on  the  dynamic  system  (6), (7)  using  the  matrices  and  observation 
vector 


A(t)  =  I,  W(0  =  ^I, 


where  the  components  yg{i,t)  of  the  vector  y^{t)  are  the  measurements  g(si,  S2,t)  ordered  lexicographically 
as  in  (10),  and  Si  and  S2  are  bidiagonal  matrices  representing  the  spatial  differencing  operations  along  the 
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two  spatial  axes  as  described  in  (27)  and  (28),  respectively.  An  interesting  aspect  of  this  formulation  is  that 
the  temporal  smoothness  equation  (26)  is  treated  as  system  dynamics  while  the  spatial  smoothness  equations 
(27), (28)  are  relegated  to  the  observation  equation  (7),  a  common  practice  in  visual  reconstruction  (Szeliski, 
1989,  Chin  et  ai,  1992). 

A  complete  space-time  interpolation  is  possible  using  two  Kalman  filters  running  causally  and  acausally 
in  time.  Here,  we  examine  the  causal  filter  for  q  =  0.01  and  r  =  0.25,  by  comparing  the  estimates  from 
the  optimal  Kalman  filtering  algorithm  and  Algorithm  2  using  various  partial  band-sweeps.  The  dimension 
of  the  spatial  domain  is  16  x  16,  so  that  n  =  256.  Note  that  the  exact  optimal  estimate  can  be  computed 
because  of  the  relatively  small  value  of  n.  For  the  reduced-order  model  approximation,  we  let  i/  —  2.  We 
consider  three  different  partial  band-sweeps  specified  by  fi  =  2,  3, 4.  A  noise-corrupted  surface  has  been 
reconstructed  using  the  three  approximate  SRI  filters  as  well  as  the  optimal  filter.  Fig.  3  shows  the  surfaces 
estimated  by  the  approximate  filter  using  the  partial  band-sweep  specified  by  /n  =  4.  Normalized  estimation 
errors 


||x^(t)  -x(f)|| 

Il2c(f)|| 


(30) 


where  x(t)  is  the  true  surface,  are  also  plotted  in  Fig.  4  for  all  the  estimates  for  the  first  8  time  frames.  These 
four  error  curves,  which  are  nearly  indistinguishable,  show  that  all  three  approximate  filters  have  performed 
virtually  identically  to  the  optimal  filter  for  this  problem. 


4.2.4  Vector  field 


Finally,  let  us  consider  the  case  where  the  field  variables  /(s,  t)  are  vectors.  We  focus  on  the  specific  problem 
of  estimating  motion  vectors  from  image  sequences  and  let  each  /(s,  t)  be  a  planar  motion  vector  with  two 
velocity  components.  Having  two  (instead  of  one)  unknowns  at  each  spatial  site  leads  to  corresponding 
expansions  in  the  SRI  filtering  equations;  however,  the  basic  form  of  these  equations  remains  the  same.  For 
example,  we  can  simply  treat  each  “element”  of  the  state  vector  x(t)  to  be  a  two-dimensional  column  vector, 
while  each  “element”  of  such  matrices  as  A(t)  and  r(t)  is  now  treated  as  a  2  x  2  matrix.  Algorithm  2 
applies  to  such  a  vector  field  estimation  problem  equally  well,  after  some  straightforward  adjustments  for 
the  increase  in  dimensionality.  In  particular,  while  in  the  scalar  estimation  cases  each  Givens  rotation  step 
performs  nulling  of  a  single  scalar,  in  the  vector  case  the  same  step  must  accomplish  complete  nulling  of  a 
higher  dimensional  “element”.  We  implement  each  of  these  steps  using  a  composite  of  Givens  rotations;  if, 
for  example,  the  “element”  consists  of  4  scalars,  an  aggregate  of  4  Givens  rotations  are  used  to  null  it.  To 
illustrate  how  to  null  one  2x2  “element”  against  another  let 


d  = 


di 

dz 


d2 

d^ 


e  = 


ex  e2 

63  64 
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and  consider  using  a  unitary  operation  to  null  e  against  d  in  the  matrix 

di  (^2 

<^3  ^4 

ei  62 
63  64 

This  task  can  be  accomplished  by  a  two-step  process:  First  use  d  to  null  the  first  row  [ei  ea]  of  e;  then,  use 
d  again  to  null  the  second  row  [es  64].  Each  of  these  steps  can  be  performed  by  essentially  a  sequence  of  two 
Givens  rotations.  Specifically,  let 


Cl  Si 

1 _ 

to 

d'^  d'4 

II 

1 

C2  S2 

ds  d4 

-Si  Cl 

-S2  C2 

ei  62 

where  cj,  =  cos  6k  and  s*  =  sin  for  i:  =  1, 2.  The  desired  sequence  of  two  rotations  64,62  can  be  obtained 
from  the  two  equations  e'j  =  0  and  e'j  =  0.  In  practice,  it  tends  to  be  easier  to  solve  for  the  c^’s  and 
Sfc’s  directly,  aided  by  the  trigonometric  identity  +  si  =  1.  Efficient  implementation  of  the  Givens 
rotations  themselves  is  beyond  the  scope  of  this  paper.  References  on  this  general  topic  include  (Gotze  and 
Schwiegelshohn,  1991,  Golub  and  van  Loan,  1989). 

Modifying  the  algorithm  as  above,  we  have  calculated  motion  vector  fields  from  an  image  sequence. 
The  estimation  problem  is  formulated  based  on  the  image  brightness  conservation  approach  studied  by 
Horn  and  Schunck  (1981)  in  conjunction  with  the  space-time  smoothness  models  (26), (27), (28)  presented  in 
Section  4.2.3.  Specifically,  by  assuming  brightness  conservation,  we  can  obtain  a  relationship  between  the 
two-dimensional  motion  vector  f{si,32,t)  and  spatio-temporal  gradients  of  the  image  brightness  (intensity) 
4>  as 


||(si,S2,t)  —  [  §^{si,S2,t)  §^{si,S2,t)  j/(si,S2,t),  (31) 

which  we  write  as  9(si,S2,t)  =  h{si,s2,t)  f{si,S2,t)  by  letting  g(si,S2)f)  and  h{si,S2,t)  be  the  scalar  on 
the  left  hand  side  and  the  first  vector  on  the  right  hand  side  of  (31),  respectively.  Because  of  non-ideal 
conditions  (e.g.,  measurement  noise)  in  practice,  this  brightness  conservation  is  not  expected  to  be  satisfied 
exactly  at  every  site.  Thus,  we  assume  a  more  appropriate  relationship 

5(si,  S2,t)  =  h{si,  S2,t)  f{si,S2,t)  +  v{si,S2,t)  (32) 

where  the  white  noise  u(si,  S2if)  represents  the  uncertainty  in  the  brightness  conservation  with  respect  to  the 
particular  measurements  of  the  brightness  gradients.  (The  brightness  gradients  are  actually  compuUd  from 
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the  measurements  of  ^(si,  sj,  t)  by  finite  differencing.)  The  observation  equation  (32)  is  complemented  by  the 
smoothness  models  (26), (27), (28)  which  are  necessary  for  the  motion  estimation  problem  to  be  well-posed 
(Horn  and  Schunck,  1981,  Chin  ei  al.,  1992).  The  system  matrices  for  (32), (26), (27), (28)  are,  therefore, 
essentially  vector-field  versions  of  (29),  except  that  C{t)  is  now  a  time-varying  matrix  given  by 


C(i)  = 


H(t) 

51 

52 


(33) 


where  H(<)  is  a  block-diagonal  matrix  whose  diagonal  blocks  are  h{si,S2,  t).  The  time- varying  nature  of  the 
system  necessitates  a  completely  “on-line”  implementation  of  the  corresponding  Kalman  filter. 

Algorithm  2  has  been  applied  to  an  image  sequence  (Fig.  5)  describing  a  simulated  fluid  flow  to  yield 
the  estimated  flow  field  shown  in  Fig.  6,  indicating  adequate  performance  of  the  filtering  algorithm.  Details 
of  this  particular  motion  estimation  problem  can  be  found  in  (Chin  et  al.,  1992).  From  a  computational 
perspective,  the  small  image  frame  size  (64  x  48)  used  in  this  simulation  is  still  large  enough  to  make  an 
exact,  optimal  implementation  of  the  Kalman  filter  impractical.  Algorithm  2,  however,  enables  computation 
of  near-optimal  estimates.  The  estimated  flow  field  in  Fig.  6  is  obtained  by  using  a  spatially  distributed 
computational  structure  over  small  neighborhoods  defined  hy  v  =  =  2  and  performing  only  3  partial 

band-sweeps  per  unitary  transformation.  To  invert  (15),  100  iterations  of  the  standard  Jacobi  iterations 
(Golub  and  van  Loan,  1989)  are  used. 


5  Conclusion 

The  two  main  developments  of  this  paper  are  a  novel  approach  to  the  unitary  transformations  in  square  root 
filtering,  in  which  the  computation  can  be  performed  in  a  finely  distributed  manner,  and  an  approximation 
technique  for  large  dimensional  space-time  SRI  filtering  problems  based  on  the  new  unitary  transformation. 
The  proposed  approximate  SRI  filter  has  two  levels  of  approximations  characterized  by  two  neighborhood 
sets.  The  computational  efficiency  and  near-optimality  of  the  filter  have  been  demonstrated  numerically. 
Systematic  selection  of  the  approximation  parameters,  such  as  the  neighborhood  sets  and  the  number  of 
iterations,  to  approximate  an  arbitrary  space-time  problem  to  a  given,  desired  accuracy  is  an  obvious  direction 
for  further  research. 

Actual  parallel  hardware  implementation  of  the  reduced-order  filter  (Algorithm  2)  also  remains  as  future 
work.  A  design  based  on  a  layered  data  mesh,  in  which  the  SRI  pair  and  system  parameters  are  stored  and 
band-sweeps  are  performed,  has  been  explored  in  (Chin  et  al.,  1993)  for  specific  space-time  problems. 

The  development  of  this  paper  was  premised  on  the  use  of  local  interaction  models  (e.g.  partial  dif¬ 
ferential  equation  models)  coupled  with  local  observations  of  these  phenomena.  While  many  space-time 
estimation  problems  fit  these  requirements,  there  are  also  noTi-locally  specified  problems  (e.g.,  tomographic 
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meeisurements  used  in  medicine,  geophysics,  and  oceanography)  of  practical  importance.  An  interesting  and 
open  question  is  how  to  efficiently  approximate  and  propagate  the  SRI  pair  for  such  non-local  cases. 
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Appendix:  Proof  of  Theorem  1 


Let  D(A:)  and  E(fc)  be  the  blocks  D  and  E,  respectively,  of  the  matrix  (18)  after  the  A***  application  of 
the  Givens  rotation.  Also,  let  dij{k)  and  e,j(A)  be  the  elements  of  D(A)  and  E(A),  respectively.  A  unitary 
transformations  like  the  Givens  rotation  preserves  the  2-norm  of  the  operand  vector  (Horn  and  Johnson, 
1985);  thus,  when  a  Givens  rotation  is  applied  to  the  f***  row  of  D(A)  and  row  of  E(A)  we  have 

djj{k  +  1)  -I-  ejjik  -b  1)  =  djj{k)  -f  e^^ik),  (34) 


for  j  =  1, . . .,  n.  Equivalently,  the  sum  of  the  squares  of  the  elements  of  the  matrix  (18)  stays  constant,  i.e.. 


D 

2 

■  D(A)  1 

E 

F 

.  e(a)  j 

where  the  subscript  F  denotes  the  Frobenius  norm  (Golub  and  van  Loan,  1989). 
Let  us  consider  the  sequence  {'^>j(^)}^o  arbitrary  j  =  1, . . . ,  n.  Since 


4.(A)  <  ||D(A)||^  < 


D(A) 

E(A) 


VA, 


(35) 


(35)  implies  that  the  sequence  is  bounded  from  above  by  c.  Also,  in  a  sweep,  an  element  in  the  column  j 
of  the  block  E(A)  can  be  nulled  only  against  the  element  djj{k).  That  is,  when  f  =  j  in  (34),  the  Givens 
rotation  makes  e|^  (A  -f  1)  zero,  or 


4(A  +  l)=4.(A)+e?,.(A).  (36) 

This  implies  that  the  sequence  (A)}^^  is  a  non-decreasing  sequence  for  each  j.  Now,  since  the  sequence 
is  both  upper-bounded  and  non-decreasing,  it  must  converge.  From  (36),  a  necessary  condition  for  the 
sequence  to  converge  for  each  j  must  be  limit_oo  e,j(A)  =  0,Vi.  This  is  because  every  element  c,y  is  nuUed 
once  in  an  iteration  of  sweep,  i.e.,  each  element  is  nulled  once  every  (pn)***  application  of  Givens  rotation  on 
the  average.  Thus,  we  have  limfc_oo  ilE(A)||  =  0,  and  the  block  E  must  be  nulled  in  the  limit. 
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Figure  1:  The  errors  in  iterative  unitary  transformations  using  partial  band-sweeps  described  in  Section  4.2.1 
The  five  dotted  lines  from  top  to  bottom  correspond  to  the  partial  band-sweeps  ^  =  1,  2,  3,  5  and  7,  respec 
lively.  The  solid  line  is  the  error  for  the  full  band-sweep  [n  —  24). 


Figure  3;  Reconstructed  surfaces  at  t  =  2,4,  and  6  (top)  and  the  actual  surface  (bottom). 
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normalized  error 


Figure  4:  Normalized  estimation  errors  for  the  first  8  time  frames  for  the  optimal  filter  and  approximate 
filters  with  fj,  =  2,3  and  4.  The  four  error  curves  are  virtually  indistinguishable. 


Figure  5;  Four  images  from  the  sequence  used  in  the  motion 
7  (top  row)  as  well  as  14  and  21  (bottom  row)  are  shown. 


vector  computation  experiment.  Frames  0  and 
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Figure  6:  An  optical  flow  field  computed  by  processing  10  frames  of  images  with  the  reduced-order  SRI  filter 
(left)  and  the  corresponding  true  flow  (right).  Every  other  flow  vectors  are  shown  for  clarity. 
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